Avoid table scans on registry #1330

bryanjos · 2024-07-23T16:29:11Z

Related to #1329, we noticed that looking in the registry would take a long time, making the unbounded concurrency issue worse as processes would wait for the lookups here to finish. This removes the table scan and replaces it with a MapSet instead.

benwilson512 · 2024-07-23T16:34:05Z

Hey @bryanjos can you post a benchee script or similar somewhere that compares this? The select should be using only the table key, so in theory the performance of both what .select and your iterative approach should be both nLog(m). That does assume :ets is smart enough though, maybe it isn't.

It'd be good to see some specific examples to demonstrate this.

bryanjos · 2024-07-23T18:07:45Z

@benwilson512 I'll try to put that together.

A teammate made this change so I might not have the full picture. I think the issue stems from select using the match spec it's using combined with the fact that the registry has duplicate keys.

There's a not at the bottom of the docs for select that I think summarizes the issue Note that for large registries with many partitions this will be costly as it builds the result by concatenating all the partitions.

bryanjos · 2024-07-24T18:43:03Z

@benwilson512 I put together a benchee script and recorded the output in this gist. The proposed change is definitely faster, but does use more memory

koudelka · 2024-07-24T19:17:45Z

in practice, i believe lookup should be O(log(n)), and select should be O(n), both multiplied by the number of registry partitions. it's basically a full table scan vs an index tree traversal lookup. the truly pathological worst case for a lookup on a duplicate_bag would be when all tuples in the table have the same key, which is when it'd become O(n).

it'd be a neat PR into the erlang codebase to (compile-time?) warn that the match spec [{:==, :"$1", id}] is bad mojo.

for what it's worth, this was a service killer for us, it brought down entire nodes, whatever is going on under the covers with select seems to entirely lock up the VM at a not-unreasonable level of concurrency, it wouldn't even service disterl for rpc health checks. i'm guessing there's something in ets that doesn't yield back to the scheduler (yay bifs). :(

benwilson512 · 2024-07-25T16:19:08Z

hey @koudelka sorry to hear that it caused such issues. I'm happy to move forward with the PR. There is still a linear part to this PR, because have to call lookup n times, once for each returned doc ID. However probably the reason this is still faster is that n is still wildly less than the total size of the registry, and when we do select we I guess have to scan the entire registry.

I have a few "code golf" items I'll remark on and then we can head towards a merge.

benwilson512 · 2024-07-25T16:23:38Z

lib/absinthe/subscription.ex

+          Map.put_new_lazy(acc, doc_id, fn ->
+            Map.update!(doc, :initial_phases, &PipelineSerializer.unpack/1)
+          end)
+      end


The structure here where you first uniq the doc_ids and then reduce into a map with Map.put_new_lazy seems redundant to me. The Mapset already guarantees that each doc_id will only exist once, so Map.put_new_lazy will never be lazy. Each doc_id will only happen once.

Beyond that, in my experiments with the MapSet and Map.new code back in the day it was generally faster to make a list of keys and values and then call Map.new on those keys / values. So perhaps can you benchmark it with using a list as the accumulator instead of the map and then pipe it to map.new?

bryanjos · 2024-07-25T19:09:48Z

@benwilson512 Looks like the made a small distance in the right direction. I'll update the PR to use a list as the accumulator and pipe into Map.new

Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.17.0
Erlang 27.0
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 42 s

Benchmarking Using MapSet and lookup ...
Benchmarking Using list accumulator ...
Benchmarking Using select ...
Calculating statistics...
Formatting results...

Name                              ips        average  deviation         median         99th %
Using list accumulator           9.25      108.11 ms     ±6.78%      104.94 ms      149.45 ms
Using MapSet and lookup          9.05      110.49 ms     ±3.31%      109.03 ms      126.19 ms
Using select                   0.0125    79835.89 ms     ±0.00%    79835.89 ms    79835.89 ms

Comparison:
Using list accumulator           9.25
Using MapSet and lookup          9.05 - 1.02x slower +2.38 ms
Using select                   0.0125 - 738.49x slower +79727.79 ms

Memory usage statistics:

Name                            average  deviation         median         99th %
Using list accumulator        152.54 MB     ±0.01%      152.53 MB      152.60 MB
Using MapSet and lookup       158.87 MB     ±0.00%      158.87 MB      158.87 MB
Using select                   31.18 MB     ±0.00%       31.18 MB       31.18 MB

Comparison:
Using list accumulator        152.53 MB
Using MapSet and lookup       158.87 MB - 1.04x memory usage +6.33 MB
Using select                   31.18 MB - 0.20x memory usage -121.36031 MB

benwilson512 · 2024-07-25T19:39:25Z

I am very puzzled at the 5x difference in memory performance between what we have today and the proposed changes. Let me see if I can poke at this and figure out what the cause is.

bryanjos · 2024-07-29T14:40:56Z

I'm thinking it's probably the use of the data structures and size of the examples. There probably is something that could tweak that could be made to balance out space and time

benwilson512

Tested and looks great, thanks!

Avoid table scans on registry

95b5eeb

bryanjos requested a review from a team July 23, 2024 16:29

bryanjos mentioned this pull request Jul 23, 2024

Add async option to Absinthe.Subscription #1329

Merged

benwilson512 reviewed Jul 25, 2024

View reviewed changes

Use list accumulator and Map.new in get/2 implementation

824cc41

benwilson512 approved these changes Jul 29, 2024

View reviewed changes

benwilson512 merged commit 5705d9a into main Jul 29, 2024
9 checks passed

benwilson512 deleted the bryanjos/registry_fix branch July 29, 2024 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid table scans on registry #1330

Avoid table scans on registry #1330

bryanjos commented Jul 23, 2024

benwilson512 commented Jul 23, 2024 •

edited

Loading

bryanjos commented Jul 23, 2024

bryanjos commented Jul 24, 2024

koudelka commented Jul 24, 2024 •

edited

Loading

benwilson512 commented Jul 25, 2024

benwilson512 Jul 25, 2024

bryanjos commented Jul 25, 2024

benwilson512 commented Jul 25, 2024

bryanjos commented Jul 29, 2024

benwilson512 left a comment

Avoid table scans on registry #1330

Avoid table scans on registry #1330

Conversation

bryanjos commented Jul 23, 2024

benwilson512 commented Jul 23, 2024 • edited Loading

bryanjos commented Jul 23, 2024

bryanjos commented Jul 24, 2024

koudelka commented Jul 24, 2024 • edited Loading

benwilson512 commented Jul 25, 2024

benwilson512 Jul 25, 2024

Choose a reason for hiding this comment

bryanjos commented Jul 25, 2024

benwilson512 commented Jul 25, 2024

bryanjos commented Jul 29, 2024

benwilson512 left a comment

Choose a reason for hiding this comment

benwilson512 commented Jul 23, 2024 •

edited

Loading

koudelka commented Jul 24, 2024 •

edited

Loading